20 research outputs found
Towards Robust Neural Networks via Random Self-ensemble
Recent studies have revealed the vulnerability of deep neural networks: A
small adversarial perturbation that is imperceptible to human can easily make a
well-trained deep neural network misclassify. This makes it unsafe to apply
neural networks in security-critical applications. In this paper, we propose a
new defense algorithm called Random Self-Ensemble (RSE) by combining two
important concepts: {\bf randomness} and {\bf ensemble}. To protect a targeted
model, RSE adds random noise layers to the neural network to prevent the strong
gradient-based attacks, and ensembles the prediction over random noises to
stabilize the performance. We show that our algorithm is equivalent to ensemble
an infinite number of noisy models without any additional memory
overhead, and the proposed training procedure based on noisy stochastic
gradient descent can ensure the ensemble model has a good predictive
capability. Our algorithm significantly outperforms previous defense techniques
on real data sets. For instance, on CIFAR-10 with VGG network (which has 92\%
accuracy without any attack), under the strong C\&W attack within a certain
distortion tolerance, the accuracy of unprotected model drops to less than
10\%, the best previous defense technique has accuracy, while our method
still has prediction accuracy under the same level of attack. Finally,
our method is simple and easy to integrate into any neural network.Comment: ECCV 2018 camera read
Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function
Trust-region (TR) and adaptive regularization using cubics (ARC) have proven
to have some very appealing theoretical properties for non-convex optimization
by concurrently computing function value, gradient, and Hessian matrix to
obtain the next search direction and the adjusted parameters. Although
stochastic approximations help largely reduce the computational cost, it is
challenging to theoretically guarantee the convergence rate. In this paper, we
explore a family of stochastic TR and ARC methods that can simultaneously
provide inexact computations of the Hessian matrix, gradient, and function
values. Our algorithms require much fewer propagations overhead per iteration
than TR and ARC. We prove that the iteration complexity to achieve
-approximate second-order optimality is of the same order as the
exact computations demonstrated in previous studies. Additionally, the mild
conditions on inexactness can be met by leveraging a random sampling technology
in the finite-sum minimization problem. Numerical experiments with a non-convex
problem support these findings and demonstrate that, with the same or a similar
number of iterations, our algorithms require less computational overhead per
iteration than current second-order methods.Comment: arXiv admin note: text overlap with arXiv:1809.0985
Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks
Graph convolutional network (GCN) has been successfully applied to many
graph-based applications; however, training a large-scale GCN remains
challenging. Current SGD-based algorithms suffer from either a high
computational cost that exponentially grows with number of GCN layers, or a
large space requirement for keeping the entire graph and the embedding of each
node in memory. In this paper, we propose Cluster-GCN, a novel GCN algorithm
that is suitable for SGD-based training by exploiting the graph clustering
structure. Cluster-GCN works as the following: at each step, it samples a block
of nodes that associate with a dense subgraph identified by a graph clustering
algorithm, and restricts the neighborhood search within this subgraph. This
simple but effective strategy leads to significantly improved memory and
computational efficiency while being able to achieve comparable test accuracy
with previous algorithms. To test the scalability of our algorithm, we create a
new Amazon2M data with 2 million nodes and 61 million edges which is more than
5 times larger than the previous largest publicly available dataset (Reddit).
For training a 3-layer GCN on this data, Cluster-GCN is faster than the
previous state-of-the-art VR-GCN (1523 seconds vs 1961 seconds) and using much
less memory (2.2GB vs 11.2GB). Furthermore, for training 4 layer GCN on this
data, our algorithm can finish in around 36 minutes while all the existing GCN
training algorithms fail to train due to the out-of-memory issue. Furthermore,
Cluster-GCN allows us to train much deeper GCN without much time and memory
overhead, which leads to improved prediction accuracy---using a 5-layer
Cluster-GCN, we achieve state-of-the-art test F1 score 99.36 on the PPI
dataset, while the previous best result was 98.71 by [16]. Our codes are
publicly available at
https://github.com/google-research/google-research/tree/master/cluster_gcn.Comment: In Proceedings of the 25th ACM SIGKDD International Conference on
Knowledge Discovery & Data Mining (KDD'19
ExpNote: Black-box Large Language Models are Better Task Solvers with Experience Notebook
Black-box Large Language Models (LLMs) have shown great power in solving
various tasks and are considered general problem solvers. However, LLMs still
fail in many specific tasks although understand the task instruction. In this
paper, we focus on the problem of boosting the ability of black-box LLMs to
solve downstream tasks. We propose ExpNote, an automated framework to help LLMs
better adapt to unfamiliar tasks through reflecting and noting experiences from
training data and retrieving them from external memory during testing. We
evaluate ExpNote on multiple tasks and the experimental results demonstrate
that the proposed method significantly improves the performance of black-box
LLMs. The data and code are available at
https://github.com/forangel2014/ExpNoteComment: EMNLP 2023 finding